Skip to content

Conversation

@davemarco
Copy link
Contributor

@davemarco davemarco commented Sep 17, 2025

Description

Modifies presto docs to support starting the webui.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

Package starts with new webui using these docs

Summary by CodeRabbit

  • Documentation
    • Moved Presto settings to a package-level config block (storage_engine/query_engine) and updated examples and config file location.
    • Clarified database.host must be reachable (non-localhost) and noted Docker network separation; updated start/stop commands (docker compose up --detach, docker compose down).
    • Added presto.host/presto.port and Web UI host/port guidance; Presto need not run before CLP.
    • Introduced results_cache.retention_period (null to disable).
    • Reworked querying guidance into Web UI and Presto CLI sections, added UI query URL and single-query note.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 17, 2025

Walkthrough

Updated Presto documentation: moved CLP config to a package block (package.query_engine/package.storage_engine), added results_cache.retention_period: null, introduced a presto host/port block, adjusted Docker start/stop commands, and split querying guidance into CLP Web UI and Presto CLI workflows with networking notes.

Changes

Cohort / File(s) Summary
Docs — Presto integration config and access
docs/src/user-docs/guides-using-presto.md
Replaced database-centric config edits with an explicit etc/clp-config.yml package block (package.storage_engine: "clp-s", package.query_engine: "presto"); added results_cache.retention_period: null; introduced a presto block for host/port and non-localhost examples; clarified separation of Presto container networking and CLP host networking; updated YAML examples and notes; changed Docker flow (docker compose up --detach, docker compose down); removed old CLI-only querying flow and added dual pathways: "Querying from CLP's UI" and "Querying from the Presto CLI"; updated cross-references and minor wording tweaks.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    actor U as User
    participant W as CLP Web UI
    participant S as CLP Server
    participant P as Presto

    rect rgb(235,245,255)
    Note over U,W: Option A — Query via CLP Web UI
    U->>W: Submit query
    W->>S: Forward query (reads presto.host/presto.port)
    S->>P: Send query to Presto
    P-->>S: Return results
    S-->>W: Relay results
    W-->>U: Display results
    end
Loading
sequenceDiagram
    autonumber
    actor U as User
    participant C as Presto CLI
    participant P as Presto

    rect rgb(245,255,235)
    Note over U,C: Option B — Query via Presto CLI (run from tools/deployment/presto-clp or via Docker)
    U->>C: Run CLI query (point to presto.host:presto.port)
    C->>P: Execute query
    P-->>C: Return results
    C-->>U: Print results
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title clearly and concisely summarizes the primary documentation changes by stating that instructions to connect CLP's UI to Presto are being added and that some setup steps are being simplified. It directly matches the PR objectives and the file diffs which introduce UI-based querying guidance and revised setup commands. The phrasing is specific and free of noisy or vague terms, so a reviewer scanning history will understand the main change.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@davemarco davemarco marked this pull request as ready for review September 18, 2025 14:12
@davemarco davemarco requested a review from a team as a code owner September 18, 2025 14:12
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9888c11 and d38cbe7.

📒 Files selected for processing (1)
  • docs/src/user-docs/guides-using-presto.md (2 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: davemarco
PR: y-scope/clp#1198
File: components/webui/server/src/plugins/app/Presto.ts:38-43
Timestamp: 2025-08-25T16:27:50.549Z
Learning: In the CLP webui Presto configuration, host and port are set via package settings (configurable), while user, catalog, and schema are set via environment variables (environment-specific). This mixed approach is intentional - settings are typically set by package and some values don't need to be package-configurable.
📚 Learning: 2025-08-25T16:27:50.549Z
Learnt from: davemarco
PR: y-scope/clp#1198
File: components/webui/server/src/plugins/app/Presto.ts:38-43
Timestamp: 2025-08-25T16:27:50.549Z
Learning: In the CLP webui Presto configuration, host and port are set via package settings (configurable), while user, catalog, and schema are set via environment variables (environment-specific). This mixed approach is intentional - settings are typically set by package and some values don't need to be package-configurable.

Applied to files:

  • docs/src/user-docs/guides-using-presto.md
🔇 Additional comments (2)
docs/src/user-docs/guides-using-presto.md (2)

36-44: Confirm storage_engine/query_engine example and explicitly tie to clp-json.

Looks good. Please explicitly state that this example applies to the clp-json flavour and that storage_engine must be clp-s for this guide, to avoid users mixing configs.

Apply this small doc tweak?

-2. Before starting the package, update the config (`etc/clp-config.yml`) as follows:
+2. Before starting the package, update the config (`etc/clp-config.yml`) as follows (for the `clp-json` package):
@@
   package:
     storage_engine: "clp-s"
     query_engine: "presto"

74-81: Document PRESTO env vars and package settings.
Update the docs to replace placeholders with the actual keys: host/port come from package settings (settings.json: PrestoHost, PrestoPort) and the environment-provided values are PRESTO_CATALOG, PRESTO_SCHEMA and USER (see components/webui/server/src/plugins/app/Presto.ts).

Comment on lines 46 to 59
* Set the `database.host` key to a non-localhost hostname/IP.

```yaml
database:
# type: "mariadb"
host: <IP-address>
# port: 3306
# name: "clp-db"
```

:::{note}
This change is necessary since the Presto containers run on a Docker network, whereas CLP's database runs on the host network. So localhost refers to two different entities.
:::

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick

Recommend host.docker.internal over hard-coding host IP; clarify Linux note.

Hard-coding an IP is brittle. Suggest preferring host.docker.internal (works on macOS/Windows and modern Docker on Linux) and add a Linux fallback.

Proposed doc change:

-    * Set the `database.host` key to a non-localhost hostname/IP.
+    * Set the `database.host` to `host.docker.internal` (recommended). If that doesn't resolve on your Linux distro, use your host's LAN IP or the Docker gateway IP.
@@
-        database:
-        #  type: "mariadb"
-          host: <IP-address>
+        database:
+        #  type: "mariadb"
+          host: host.docker.internal   # or your host IP / Docker gateway IP on Linux
         #  port: 3306
         #  name: "clp-db"
@@
-      This change is necessary since the Presto containers run on a Docker network, whereas CLP's database runs on the host network. So localhost refers to two different entities.
+      Presto runs inside Docker, while CLP's DB runs on the host. `localhost` would point to different networks. `host.docker.internal` bridges the two.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
* Set the `database.host` key to a non-localhost hostname/IP.
```yaml
database:
# type: "mariadb"
host: <IP-address>
# port: 3306
# name: "clp-db"
```
:::{note}
This change is necessary since the Presto containers run on a Docker network, whereas CLP's database runs on the host network. So localhost refers to two different entities.
:::
* Set the `database.host` to `host.docker.internal` (recommended). If that doesn't resolve on your Linux distro, use your host's LAN IP or the Docker gateway IP.
🤖 Prompt for AI Agents
In docs/src/user-docs/guides-using-presto.md around lines 46 to 59, the
instructions currently suggest hard-coding the host IP for database.host; update
the guidance to recommend using host.docker.internal instead of a fixed IP
(which is supported on macOS/Windows and on modern Docker for Linux), and add a
brief Linux fallback note explaining how to enable host.docker.internal or use
the host network or the host's IP (with a pointer to Docker/OS-specific steps)
so readers have a resilient alternative rather than brittle hard-coded
addresses.

Comment on lines 60 to 73
* Set the `results_cache.retention_period` key to `null`. The CLP presto integration does not
yet provide support for garbage collection.

```yaml
results_cache:
# host: "localhost"
# port: 27017
# db_name: "clp-query-results"
# stream_collection_name: "stream-files"
#
# # Retention for search results, in minutes. Set to null to disable automatic deletion.
retention_period: null
```

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick

Call out unbounded cache growth and offer manual cleanup guidance.

Setting retention_period: null disables GC and can cause unbounded growth. Add an explicit warning and a pointer to manual cleanup steps.

Suggested addition:

       results_cache:
@@
-        #  # Retention for search results, in minutes. Set to null to disable automatic deletion.
+        #  # Retention for search results, in minutes. Set to null to disable automatic deletion.
           retention_period: null
       ```
+
+      :::warning
+      With `retention_period: null`, the results cache will grow without bounds. Plan periodic manual cleanup or set a finite retention once GC is supported.
+      :::
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
* Set the `results_cache.retention_period` key to `null`. The CLP presto integration does not
yet provide support for garbage collection.
```yaml
results_cache:
# host: "localhost"
# port: 27017
# db_name: "clp-query-results"
# stream_collection_name: "stream-files"
#
# # Retention for search results, in minutes. Set to null to disable automatic deletion.
retention_period: null
```
* Set the `results_cache.retention_period` key to `null`. The CLP presto integration does not
yet provide support for garbage collection.
🤖 Prompt for AI Agents
docs/src/user-docs/guides-using-presto.md around lines 60 to 73: the docs
currently state that results_cache.retention_period can be set to null but do
not warn about unbounded cache growth; add a clear warning block right after the
retention_period example that states that setting retention_period: null
disables GC and can cause unbounded growth, and include a short recommendation
to plan periodic manual cleanup or set a finite retention once GC is supported,
plus a pointer or link to the manual cleanup steps or housekeeping guide
elsewhere in the docs.

Comment on lines 166 to 170
To query your logs through Presto, you can use either:

* The WebUI available at [http://localhost:4000](http://localhost:4000) (if you changed
`webui.host` or `webui.port` in `etc/clp-config.yml`, use the new values)
* The Presto CLI:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick

Disambiguate CLP Web UI vs Presto Web UI; add link to Presto UI (port verification).

Bullet reads as “WebUI” but it’s CLP’s web UI. Since this PR adds “setting up presto webui”, include the Presto coordinator UI endpoint too and confirm it matches the configured port.

Suggested wording:

-To query your logs through Presto, you can use either:
+To query your logs through Presto, you can use either:
@@
-* The WebUI available at [http://localhost:4000](http://localhost:4000) (if you changed
-`webui.host` or `webui.port` in `etc/clp-config.yml`, use the new values)
+* The CLP Web UI at [http://localhost:4000](http://localhost:4000) (if you changed
+  `webui.host` or `webui.port` in `etc/clp-config.yml`, use the new values)
+* The Presto Web UI at `http://<presto.host>:<presto.port>/` (e.g., `http://localhost:8889/`) to inspect queries and cluster state
 * The Presto CLI:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
To query your logs through Presto, you can use either:
* The WebUI available at [http://localhost:4000](http://localhost:4000) (if you changed
`webui.host` or `webui.port` in `etc/clp-config.yml`, use the new values)
* The Presto CLI:
To query your logs through Presto, you can use either:
* The CLP Web UI at [http://localhost:4000](http://localhost:4000) (if you changed
`webui.host` or `webui.port` in `etc/clp-config.yml`, use the new values)
* The Presto Web UI at `http://<presto.host>:<presto.port>/` (e.g., `http://localhost:8889/`) to inspect queries and cluster state
* The Presto CLI:

Copy link
Collaborator

@quinntaylormitchell quinntaylormitchell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coming along! only a few comments to address.

Comment on lines 166 to 170
To query your logs through Presto, you can use either:

* The WebUI available at [http://localhost:4000](http://localhost:4000) (if you changed
`webui.host` or `webui.port` in `etc/clp-config.yml`, use the new values)
* The Presto CLI:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To query your logs through Presto, you can use either:
* The WebUI available at [http://localhost:4000](http://localhost:4000) (if you changed
`webui.host` or `webui.port` in `etc/clp-config.yml`, use the new values)
* The Presto CLI:
To query your logs through Presto, you can use either the [webUI](#querying-in-the-webui) or the
[Presto CLI](#querying-from-the-presto-cli).
### Querying in the webUI
The webUI is available at [http://localhost:4000](http://localhost:4000) (if you changed
`webui.host` or `webui.port` in `etc/clp-config.yml`, use the new values).
### Querying from the Presto CLI
To access the Presto CLI, navigate to the `tools/deployment/presto-clp` directory in a new terminal.
Make sure Presto is still running in whichever terminal you originally started it from. Enter the
following command to start the CLI:

I think that this warrants being split into subsections, as it's awkward to go from the end of this list (the Presto CLI point) into an un-indented explanation of the Presto CLI.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i thought about this, but there is text underneath about certain SQL commands that apply to both. And putting the two headers makes it look like it only applies to the CLI

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah very true, my bad! I do think that could be clearer; I will come up with a format that makes that clearer, and I'll include it with my next review

Co-authored-by: Quinn Taylor Mitchell <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/src/user-docs/guides-using-presto.md (1)

157-165: Use docker compose down for full cleanup (consider -v to remove volumes).

docker compose rm removes stopped containers but leaves networks/volumes. For a clean slate, prefer down and optionally -v.

-To clean up the Presto cluster entirely:
+To clean up the Presto cluster entirely:
@@
-docker compose rm
+docker compose down -v
♻️ Duplicate comments (4)
docs/src/user-docs/guides-using-presto.md (4)

46-55: Prefer host.docker.internal over hard‑coding a host IP (add Linux fallback).

Hard‑coding an IP is brittle across networks. Recommend host.docker.internal and document a Linux fallback.

-    * Set the `database.host` key to a non-localhost hostname/IP.
+    * Set the `database.host` to `host.docker.internal` (recommended). If that doesn't resolve on your Linux distro,
+      use your host's LAN IP or the Docker gateway IP.
@@
-          host: <IP-address>
+          host: host.docker.internal   # or your host IP / Docker gateway IP on Linux

62-74: Warn that retention_period: null disables GC and can grow unbounded.

Call this out explicitly to prevent surprises and suggest manual cleanup.

           retention_period: null
       ```
+
+      :::warning
+      With `retention_period: null`, the results cache can grow without bounds. Plan periodic manual cleanup
+      or set a finite retention once GC is supported.
+      :::

169-169: Disambiguate “UI” to “CLP Web UI” and add Presto Web UI link.

Avoid ambiguity and surface the coordinator UI for query/cluster inspection.

-You can query your compressed logs from CLP’s [UI](#querying-from-the-ui) or the [Presto CLI](#querying-from-the-presto-cli).
+You can query your compressed logs from the [CLP Web UI](#querying-from-the-ui) or the
+[Presto CLI](#querying-from-the-presto-cli). You can also inspect queries in the Presto Web UI at
+`http://<presto.host>:<presto.port>/` (e.g., `http://localhost:8889/` if following this guide).

203-216: Consistency: say “CLP Web UI” in headings and links.

Use “Web UI” consistently to match earlier phrasing and reduce ambiguity.

-### Querying from the UI
+### Querying from the CLP Web UI
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 83c9016 and 2156ce1.

📒 Files selected for processing (1)
  • docs/src/user-docs/guides-using-presto.md (3 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: davemarco
PR: y-scope/clp#1198
File: components/webui/server/src/plugins/app/Presto.ts:38-43
Timestamp: 2025-08-25T16:27:50.549Z
Learning: In the CLP webui Presto configuration, host and port are set via package settings (configurable), while user, catalog, and schema are set via environment variables (environment-specific). This mixed approach is intentional - settings are typically set by package and some values don't need to be package-configurable.
📚 Learning: 2025-08-25T16:27:50.549Z
Learnt from: davemarco
PR: y-scope/clp#1198
File: components/webui/server/src/plugins/app/Presto.ts:38-43
Timestamp: 2025-08-25T16:27:50.549Z
Learning: In the CLP webui Presto configuration, host and port are set via package settings (configurable), while user, catalog, and schema are set via environment variables (environment-specific). This mixed approach is intentional - settings are typically set by package and some values don't need to be package-configurable.

Applied to files:

  • docs/src/user-docs/guides-using-presto.md
🪛 markdownlint-cli2 (0.17.2)
docs/src/user-docs/guides-using-presto.md

203-203: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


207-207: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: lint-check (macos-15)

Copy link
Collaborator

@quinntaylormitchell quinntaylormitchell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only a few comments.

I like the new layout of the "Querying your logs" section, I think that works well.

question: are we going for "webUI" or "UI"? my vote is for "webUI" as it more clearly indicates that it's accessed through a web browser. if you want to stick with "UI", you can ignore my related comments.

also: when I used the Presto webUI, I noticed that queries with semicolons (as described/suggested in the "Querying your logs through Presto" section) didn't work. Removing the semicolon makes them work. is that a known bug? and if so, will it be fixed by the time these docs are merged? if it's not fixed, we should give a warning in these docs.

@davemarco
Copy link
Contributor Author

also: when I used the Presto webUI, I noticed that queries with semicolons (as described/suggested in the "Querying your logs through Presto" section) didn't work. Removing the semicolon makes them work. is that a known bug?

Yes, the webui can only take one query at a time, when the cli can accept multiple. Since there is only one query there is no semi colon required. I dont know if bug or not, but maybe can add something that strips the semicolon. For now, i added a note in the webui. If u want i could instead strip the semicolon from the examples

Copy link
Collaborator

@quinntaylormitchell quinntaylormitchell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

Copy link
Member

@kirkrodrigues kirkrodrigues left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tested yet, so I might have some more comments after.

# stream_collection_name: "stream-files"
#
# # Retention for search results, in minutes. Set to null to disable automatic deletion.
retention_period: null
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
retention_period: null
retention_period: null

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i changed the whole indenting for this block in another commit

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f3043fe and d8ad955.

📒 Files selected for processing (1)
  • docs/src/user-docs/guides-using-presto.md (3 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: davemarco
PR: y-scope/clp#1198
File: components/webui/server/src/plugins/app/Presto.ts:38-43
Timestamp: 2025-08-25T16:27:50.549Z
Learning: In the CLP webui Presto configuration, host and port are set via package settings (configurable), while user, catalog, and schema are set via environment variables (environment-specific). This mixed approach is intentional - settings are typically set by package and some values don't need to be package-configurable.
📚 Learning: 2025-08-25T16:27:50.549Z
Learnt from: davemarco
PR: y-scope/clp#1198
File: components/webui/server/src/plugins/app/Presto.ts:38-43
Timestamp: 2025-08-25T16:27:50.549Z
Learning: In the CLP webui Presto configuration, host and port are set via package settings (configurable), while user, catalog, and schema are set via environment variables (environment-specific). This mixed approach is intentional - settings are typically set by package and some values don't need to be package-configurable.

Applied to files:

  • docs/src/user-docs/guides-using-presto.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (8)
docs/src/user-docs/guides-using-presto.md (8)

41-44: LGTM: package query_engine config

The example sets query_engine to "presto" alongside storage_engine; matches the integration flow.


46-55: Prefer host.docker.internal over hard-coding host IP

Recommend defaulting to host.docker.internal with a Linux fallback to avoid brittle, environment-specific IPs.

-    * Set the `database.host` key to a non-localhost hostname/IP.
+    * Set the `database.host` to `host.docker.internal` (recommended). If that doesn't resolve on
+      your Linux distro, use your host's LAN IP or the Docker gateway IP.
@@
-        database:
-        #  type: "mariadb"
-          host: "<non-local-ip-address>"
+        database:
+        #  type: "mariadb"
+          host: host.docker.internal   # or your host IP / Docker gateway IP on Linux
         #  port: 3306
         #  name: "clp-db"

56-61: Tighten the networking note; call out host.docker.internal

Clarify the split networks and mention the recommended bridge hostname explicitly.

-      This change is necessary because the Presto containers run on a Docker network, and CLP's
-      database runs on the host network. `localhost` will refer to a different entity in each of
-      those contexts. This limitation will be addressed in the future when we unify Presto and CLP's
-      deployment infrastructure.
+      Presto runs inside Docker while CLP's database runs on the host. `localhost` points to
+      different networks in each context. `host.docker.internal` bridges Docker → host on
+      macOS/Windows and modern Linux; otherwise use your host's IP or the Docker gateway IP.

66-75: Warn about unbounded cache growth when retention_period: null

Add an explicit warning so users plan housekeeping.

       ```yaml
       results_cache:
@@
           retention_period: null
       ```
+
+      :::warning
+      With `results_cache.retention_period: null`, the results cache can grow without bounds.
+      Plan periodic manual cleanup, or set a finite retention once GC is supported.
+      :::

77-85: Make YAML match stated defaults (localhost:8889)

Align the example with the paragraph above to reduce ambiguity.

       ```yaml
       presto:
-        host: "<ip-address>"
-        port: <port>
+        host: localhost
+        port: 8889
       ```

87-89: Document config split: host/port in config; user/catalog/schema via env

State the intentional split so users know where to set which values.

       :::{note}
       Presto doesn't need to be running before you start CLP.
       :::
+
+      :::{note}
+      CLP sets Presto host/port in package config, while user, catalog, and schema are provided via
+      environment variables. This split keeps package defaults stable and environment‑specific
+      values outside the package config.
+      :::

174-176: Disambiguate CLP UI vs Presto Web UI; add Presto UI link

Call the CLP UI out explicitly and include the Presto coordinator UI for inspection/monitoring.

-You can query your compressed logs in your browser from [CLP's UI](#querying-from-clps-ui), or
-from the command line using the [Presto CLI](#querying-from-the-presto-cli).
+You can query your compressed logs in your browser from [CLP's UI](#querying-from-clps-ui), or
+from the command line using the [Presto CLI](#querying-from-the-presto-cli). You can also inspect
+queries and cluster state in the Presto Web UI at `http://<presto.host>:<presto.port>/`
+(e.g., `http://localhost:8889/`).

77-85: Confirmed — docker‑compose exposes Presto coordinator UI on host port 8889

tools/deployment/presto-clp/docker-compose.yaml maps host 8889 -> container 8080 for presto-coordinator, so docs referencing localhost:8889 are correct (docs/src/user-docs/guides-using-presto.md: lines 77–85, 209–216, 218–229).

Copy link
Contributor Author

@davemarco davemarco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kirk comments

# stream_collection_name: "stream-files"
#
# # Retention for search results, in minutes. Set to null to disable automatic deletion.
retention_period: null
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i changed the whole indenting for this block in another commit

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/src/user-docs/guides-using-presto.md (1)

148-161: Add Presto Web UI endpoint

Surface where to view cluster state and queries; aligns with this PR’s objective.

       ```bash
       docker compose up
       ```
 
       * To use more than one Presto worker, you can use the `--scale` option as follows:
@@
       * Replace `<num-workers>` with the number of Presto worker nodes you want to run.
+
+      The Presto Web UI is available at `http://<presto.host>:<presto.port>/`
+      (e.g., `http://localhost:8889/`) to inspect queries and cluster state.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d495f97 and 43a39cc.

📒 Files selected for processing (1)
  • docs/src/user-docs/guides-using-presto.md (3 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: davemarco
PR: y-scope/clp#1198
File: components/webui/server/src/plugins/app/Presto.ts:38-43
Timestamp: 2025-08-25T16:27:50.549Z
Learning: In the CLP webui Presto configuration, host and port are set via package settings (configurable), while user, catalog, and schema are set via environment variables (environment-specific). This mixed approach is intentional - settings are typically set by package and some values don't need to be package-configurable.
📚 Learning: 2025-08-25T16:27:50.549Z
Learnt from: davemarco
PR: y-scope/clp#1198
File: components/webui/server/src/plugins/app/Presto.ts:38-43
Timestamp: 2025-08-25T16:27:50.549Z
Learning: In the CLP webui Presto configuration, host and port are set via package settings (configurable), while user, catalog, and schema are set via environment variables (environment-specific). This mixed approach is intentional - settings are typically set by package and some values don't need to be package-configurable.

Applied to files:

  • docs/src/user-docs/guides-using-presto.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: lint-check (macos-15)
  • GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (3)
docs/src/user-docs/guides-using-presto.md (3)

209-216: Scope the semicolon limitation to CLP’s UI; clarify CLI unaffected

Prevents confusion when switching to the CLI.

-:::{note}
-The UI can only run one query at a time, and queries must not end with a `;`.
-:::
+:::{note}
+CLP’s UI can only run one query at a time, and queries must not end with a `;`.
+This limitation does not apply to the Presto CLI.
+:::

224-229: Note CLI supports semicolons and multi‑statements

Set expectations consistent with Presto CLI behaviour.

 ```bash
 docker compose exec presto-coordinator \
   presto-cli \
     --catalog clp \
     --schema default

+The Presto CLI accepts semicolon‑terminated statements and supports multiple statements in one
+submission.


---

`46-61`: **Prefer host.docker.internal over hard‑coding IP; add Linux fallback**

Reduces brittleness and helps most users out‑of‑the‑box. Also tighten the note wording.



```diff
-    * Set the `database.host` key to a non-localhost hostname/IP.
+    * Set the `database.host` to `host.docker.internal` (recommended). If that doesn't resolve on
+      your Linux distro, use your host's LAN IP or the Docker gateway IP.

       ```yaml
-        database:
-        #  type: "mariadb"
-          host: "<non-local-ip-address>"
-        #  port: 3306
-        #  name: "clp-db"
+        database:
+        #  type: "mariadb"
+          host: host.docker.internal   # or your host IP / Docker gateway IP on Linux
+        #  port: 3306
+        #  name: "clp-db"
       ```

-      :::{note}
-      This change is necessary because the Presto containers run on a Docker network, and CLP's
-      database runs on the host network. `localhost` will refer to a different entity in each of
-      those contexts. This limitation will be addressed in the future when we unify Presto and CLP's
-      deployment infrastructure.
-      :::
+      :::{note}
+      Presto runs inside Docker, while CLP's DB runs on the host. `localhost` would point to different
+      networks. `host.docker.internal` bridges the two. This limitation will be addressed when we unify
+      Presto and CLP's deployment infrastructure.
+      :::

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 43a39cc and 51ded2a.

📒 Files selected for processing (1)
  • docs/src/user-docs/guides-using-presto.md (4 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: davemarco
PR: y-scope/clp#1198
File: components/webui/server/src/plugins/app/Presto.ts:38-43
Timestamp: 2025-08-25T16:27:50.549Z
Learning: In the CLP webui Presto configuration, host and port are set via package settings (configurable), while user, catalog, and schema are set via environment variables (environment-specific). This mixed approach is intentional - settings are typically set by package and some values don't need to be package-configurable.
📚 Learning: 2025-08-25T16:27:50.549Z
Learnt from: davemarco
PR: y-scope/clp#1198
File: components/webui/server/src/plugins/app/Presto.ts:38-43
Timestamp: 2025-08-25T16:27:50.549Z
Learning: In the CLP webui Presto configuration, host and port are set via package settings (configurable), while user, catalog, and schema are set via environment variables (environment-specific). This mixed approach is intentional - settings are typically set by package and some values don't need to be package-configurable.

Applied to files:

  • docs/src/user-docs/guides-using-presto.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: antlr-code-committed (macos-15)
  • GitHub Check: lint-check (macos-15)
  • GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (8)
docs/src/user-docs/guides-using-presto.md (8)

46-55: Prefer host.docker.internal over hard‑coding host IP; add Linux fallback note

Using a fixed IP is brittle. Recommend host.docker.internal with a brief Linux fallback.

-    * Set the `database.host` key to a non-localhost hostname/IP.
+    * Set `database.host` to `host.docker.internal` (recommended). On Linux where it isn't available,
+      use your host's LAN IP or the Docker gateway IP.

       ```yaml
         database:
         #  type: "mariadb"
-          host: "<non-local-ip-address>"
+          host: host.docker.internal   # or your host IP / Docker gateway IP on Linux
         #  port: 3306
         #  name: "clp-db"
       ```

56-61: Tighten and clarify the Docker networking note

Minor reword for flow and clarity.

-      This change is necessary because the Presto containers run on a Docker network, and CLP's
-      database runs on the host network. `localhost` will refer to a different entity in each of
-      those contexts. This limitation will be addressed in the future when we unify Presto and CLP's
-      deployment infrastructure.
+      Presto runs inside Docker while CLP's database runs on the host. `localhost` points to different
+      networks in each context. This limitation will be addressed when we unify Presto and CLP's
+      deployment infrastructure.

63-76: Warn about unbounded cache growth when retention_period is null

Without GC, the cache can grow until disk exhaustion.

       ```yaml
       results_cache:
@@
         retention_period: null
       ```
+
+      :::warning
+      With `retention_period: null`, the results cache can grow without bounds. Plan periodic manual
+      cleanup or set a finite retention once GC is supported.
+      :::

77-85: Make the Presto host/port example match the stated defaults (localhost:8889)

Reduces ambiguity and aligns with the paragraph above.

       ```yaml
       presto:
-        host: "<ip-address>"
-        port: <port>
+        host: localhost
+        port: 8889
       ```

87-89: Document config responsibility split: host/port in config; user/catalog/schema via env

Aligns with intentional design in the Web UI plugin.

       :::{note}
       Presto doesn't need to be running before you start CLP.
       :::
+
+      :::{note}
+      Set Presto host/port in package config. Provide user, catalog, and schema via environment
+      variables. This keeps package defaults stable and environment‑specific values outside the config.
+      :::

178-180: Include Presto Web UI alongside CLP UI and CLI

Surface the coordinator UI endpoint and tie it to the configured port.

-You can query your compressed logs in your browser from [CLP's UI](#querying-from-clps-ui), or
-from the command line using the [Presto CLI](#querying-from-the-presto-cli).
+You can query your compressed logs in your browser from [CLP's UI](#querying-from-clps-ui), inspect
+queries and cluster state in the Presto Web UI at `http://<presto.host>:<presto.port>/`
+(e.g., `http://localhost:8889/`), or use the [Presto CLI](#querying-from-the-presto-cli).

218-220: Scope the semicolon limitation to CLP’s UI; clarify CLI behaviour

Avoids confusion for CLI users.

-:::{note}
-The UI can only run one query at a time, and queries must not end with a `;`.
-:::
+:::{note}
+CLP’s UI can only run one query at a time, and queries entered there must not end with a `;`.
+This limitation does not apply to the Presto CLI.
+:::

228-233: Note that the Presto CLI accepts semicolons and multi‑statement submissions

Sets accurate expectations.

   presto-cli \
     --catalog clp \
     --schema default

+The Presto CLI accepts semicolon‑terminated statements and supports submitting multiple statements in one invocation.


</blockquote></details>

</blockquote></details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

Copy link
Member

@kirkrodrigues kirkrodrigues left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the PR title, how about:

docs(presto-clp): Add instructions for connecting CLP's UI to Presto; Simplify some set up instructions.

@kirkrodrigues kirkrodrigues changed the title docs(webui): Update docs for presto integration to include setting up presto webui. docs(presto-clp): Add instructions for connecting CLP's UI to Presto; Simplify some set up instructions. Sep 23, 2025
@davemarco davemarco merged commit b20b37d into y-scope:main Sep 23, 2025
7 checks passed

```bash
docker compose up
docker compose up --detach
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be better to also add the --wait flag so users can be notified if any service fails to start successfully

@coderabbitai add an issue to track

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Create a GitHub issue to track the suggestion

gh issue create \
  --repo y-scope/clp \
  --title "docs(presto): Add --wait flag to docker compose up command in Presto documentation" \
  --body "## Description

Add the \`--wait\` flag to the \`docker compose up --detach\` command in the Presto integration documentation so users can be notified if any service fails to start successfully.

## Current command
\`\`\`bash
docker compose up --detach
\`\`\`

## Suggested command
\`\`\`bash
docker compose up --detach --wait
\`\`\`

## Rationale
The \`--wait\` flag ensures that the command waits until services are running and healthy before returning. This provides better feedback to users if any service fails to start successfully.

## References
- PR: https://github.com/y-scope/clp/pull/1310
- Comment: https://github.com/y-scope/clp/pull/1310#discussion_r2461661299
- Requested by: @junhaoliao" \
  --assignee junhaoliao

Length of output: 42


@junhaoliao, I've created issue #1495 to track adding the --wait flag to the docker compose up --detach command in the Presto documentation. This will help users get notified if any service fails to start successfully.

Comment on lines +46 to +61
* Set the `database.host` key to a non-localhost hostname/IP.

```yaml
database:
# type: "mariadb"
host: "<non-local-ip-address>"
# port: 3306
# name: "clp-db"
```

:::{note}
This change is necessary because the Presto containers run on a Docker network, and CLP's
database runs on the host network. `localhost` will refer to a different entity in each of
those contexts. This limitation will be addressed in the future when we unify Presto and CLP's
deployment infrastructure.
:::
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(i figure this docs PR should provide better context, although the code for the Presto Docker Compose project was originally added in #1132.)

The mitigation documented above is practical but may be inconvenient for users operating in diverse network environments.

In the long term, we plan to integrate the Presto services into the CLP Package Docker Compose project introduced in #1178. Running du -h tools/deployment/presto-clp shows the directory is only 132 KB, indicating that the integration would not significantly increase the package size while substantially improving user convenience.
@coderabbitai please file an issue to track this long-term integration plan.

Since implementing and reviewing this integration may take time, a short-term solution is needed before the next release. After #1178, the webui service may also encounter connectivity issues with the Presto connector, as host network mode was removed in that update.
A likely mitigation is to allow the Presto project to attach to an external Docker network defined in the CLP Package Compose project.
@coderabbitai please create a separate issue to track this short-term fix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash

# Get the repository owner and name from the current directory
REPO="y-scope/clp"
PR_NUMBER="1310"
COMMENT_URL="https://github.com/y-scope/clp/pull/1310#discussion_r2462666131"

# Create the long-term issue for full Presto integration
echo "Creating long-term issue for Presto integration..."
gh issue create \
  --repo "$REPO" \
  --title "Integrate Presto services into CLP Package Docker Compose project" \
  --body "## Context

As documented in PR #1310, the current Presto integration requires users to set \`database.host\` to a non-localhost hostname/IP because the Presto containers run on a Docker network while CLP's database runs on the host network. This mitigation is practical but inconvenient for users operating in diverse network environments.

## Long-term Solution

Integrate the Presto services into the CLP Package Docker Compose project introduced in #1178. The \`tools/deployment/presto-clp\` directory is only 132 KB in size, so this integration would not significantly increase the package size while substantially improving user convenience.

## Benefits

- Eliminates the need for non-localhost hostname configuration
- Provides a unified deployment experience
- Simplifies network connectivity between CLP and Presto services

## References

- PR #1310: $COMMENT_URL
- Original Presto Docker Compose addition: #1132
- CLP Package Docker Compose project: #1178
- Requested by: @junhaoliao" \
  --assignee junhaoliao

echo ""
echo "Creating short-term issue for external Docker network..."

# Create the short-term issue for external Docker network attachment
gh issue create \
  --repo "$REPO" \
  --title "Allow Presto project to attach to external Docker network for CLP Package Compose" \
  --body "## Context

After #1178, the \`webui\` service may encounter connectivity issues with the Presto connector because host network mode was removed in that update. Since implementing and reviewing the full integration (tracked separately) may take time, a short-term solution is needed before the next release.

## Short-term Solution

Allow the Presto project to attach to an external Docker network defined in the CLP Package Compose project. This will enable proper connectivity between the webui service and Presto without requiring full integration.

## Implementation Notes

- This is a temporary mitigation until the full integration is completed
- Should be implemented before the next release
- Will resolve connectivity issues between webui and Presto services

## References

- PR #1310: $COMMENT_URL
- CLP Package Docker Compose project: #1178
- Requested by: @junhaoliao" \
  --assignee junhaoliao

echo ""
echo "Both issues created successfully!"

Length of output: 229


@junhaoliao, I've created both issues as requested:

  1. Long-term integration: #1497 - Integrate Presto services into CLP Package Docker Compose project

    • Addresses the inconvenience of requiring non-localhost hostname configuration
    • Proposes full integration of the 132 KB Presto project into the CLP Package Docker Compose
  2. Short-term fix: #1498 - Allow Presto project to attach to external Docker network for CLP Package Compose

Both issues have been assigned to you and include references to PR #1310 and the relevant context.


🧠 Learnings used
Learnt from: davemarco
PR: y-scope/clp#1198
File: components/webui/server/src/plugins/app/Presto.ts:38-43
Timestamp: 2025-08-25T16:27:50.549Z
Learning: In the CLP webui Presto configuration, host and port are set via package settings (configurable), while user, catalog, and schema are set via environment variables (environment-specific). This mixed approach is intentional - settings are typically set by package and some values don't need to be package-configurable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants